Random Projections for $k$-means Clustering

نویسندگان

  • Christos Boutsidis
  • Anastasios Zouzias
  • Petros Drineas
چکیده

This paper discusses the topic of dimensionality reduction for k-means clustering. We prove that any set of n points in d dimensions (rows in a matrix A ∈ R) can be projected into t = Ω(k/ε) dimensions, for any ε ∈ (0, 1/3), in O(nd⌈ε−2k/ log(d)⌉) time, such that with constant probability the optimal k-partition of the point set is preserved within a factor of 2 + ε. The projection is done by post-multiplying A with a d × t random matrix R having entries +1/ √ t or −1/ √ t with equal probability. A numerical implementation of our technique and experiments on a large face images dataset verify the speed and the accuracy of our theoretical results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Data Clustering Algorithm Using Modified Krill Herd Algorithm and K-MEANS

Data clustering is the process of partitioning a set of data objects into meaning clusters or groups. Due to the vast usage of clustering algorithms in many fields, a lot of research is still going on to find the best and efficient clustering algorithm. K-means is simple and easy to implement, but it suffers from initialization of cluster center and hence trapped in local optimum. In this paper...

متن کامل

Randomized Dimensionality Reduction for k-Means Clustering

We study the topic of dimensionality reduction for k-means clustering. Dimensionality reduction encompasses the union of two approaches: 1) feature selection and 2) feature extraction. A feature selection-based algorithm for k-means clustering selects a small subset of the input features and then applies k-means clustering on the selected features. A feature extraction-based algorithm for k-mea...

متن کامل

Dimensionality Reduction for Distance Based Video Clustering

Clustering of video sequences is essential in order to perform video summarization. Because of the high spatial and temporal dimensions of the video data, dimensionality reduction becomes imperative before performing Euclidean distance based clustering. In this paper, we present non-adaptive dimensionality reduction approaches using random projections on the video data. Assuming the data to be ...

متن کامل

Iterative random projections for high-dimensional data clustering

In this text we propose a method which efficiently performs clustering of high-dimensional data. The method builds on random projection and the Kmeans algorithm. The idea is to apply K-means several times, increasing the dimensionality of the data after each convergence of K-means. We compare the proposed algorithm on four high-dimensional datasets, image, text and two synthetic, with K-means c...

متن کامل

Persistent K-Means: Stable Data Clustering Algorithm Based on K-Means Algorithm

Identifying clusters or clustering is an important aspect of data analysis. It is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. It is a main task of exploratory data mining, and a common technique for statistical data analysis This paper proposed an improved version of K-Means algorithm, namely Persistent K...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010